Rarity of Words in a Language and in a Corpus
نویسنده
چکیده
$ VLPSOH PHWKRG ZDV SUHVHQWHG ODVW \HDU +ODYiþRYi 5\FKOê DOORZLQJ WR GLVWLQJXLVK DXWRPDWLFDOO\ EHWZHHQ UDUH DQG FRPPRQ ZRUGV KDYLQJ WKH VDPH IUHTXHQF\ LQ D ODQJXDJH FRUSXV 7KH PHWKRG RSHUDWHV ZLWK WZR QHZ WHUPV UHGXFHG IUHTXHQF\ DQG UDULW\ 7KH UDULW\ ZDV SURSRVHG DV D PHDVXUH RI ZRUG UDUHQHVV RU FRPPRQQHVV LQ D ODQJXDJH 7KLV DUWLFOH GHDOV ZLWK WKH UDULW\ D ELW PRUH GHHSO\ ,WV YDOXH ZDV FDOFXODWHG IRU VHYHUDO GLIIHUHQW FRUSRUD DQG FRPSDUHG 7ZR H[SHULPHQWV ZHUH GRQH RQ WKH UHDO GDWD WDNHQ IURP WKH &]HFK 1DWLRQDO &RUSXV 5HVXOWV RI WKH ILUVW RQH SURYH WKDW UHRUGHULQJ RI WH[WV LQ WKH FRUSXV GRHV QRW LQIOXHQFH WKH UDULW\ RI ZRUGV ZLWK D KLJK IUHTXHQF\ LQ WKH FRUSXV ,Q WKH VHFRQG H[SHULPHQW UDULW\ RI WKH VDPH ZRUGV LQ WZR FRUSRUD RI GLIIHUHQW VL]HV LV FRPSDUHG
منابع مشابه
Vocabulary Lists for EAP and Conversation Students
Despite the abundance of research investigating general and academic vocabularies and developing dozens of word lists, few studies have compared academic vocabulary with general service word lists such as conversation vocabulary. Many EAP researchers assume that university students need to know all the words in West’s (1953) General Service List (GSL) as a prerequisite to academic words (e.g., ...
متن کاملA Corpus-Based Study of the Lexical Make-up of Applied Linguistics Article Abstracts
This paper reports results from a corpus-based study that explored the frequency of words in the abstracts of applied linguistics journal articles. The abstracts of major articles in leading applied linguists journals, published since 2005 up to November 2001 were analyzed using software modules from the Compleat Lexical Tutor. The output includes a list of the most frequent content words, list...
متن کاملCorpus based coreference resolution for Farsi text
"Coreference resolution" or "finding all expressions that refer to the same entity" in a text, is one of the important requirements in natural language processing. Two words are coreference when both refer to a single entity in the text or the real world. So the main task of coreference resolution systems is to identify terms that refer to a unique entity. A coreference resolution tool could be...
متن کاملپیکره اعلام: یک پیکره استاندارد واحدهای اسمی برای زبان فارسی
Named entity recognition (NER) is a natural language processing (NLP) problem that is mainly used for text summarization, data mining, data retrieval, question and answering, machine translation, and document classification systems. A NER system is tasked with determining the border of each named entity, recognizing its type and classifying it into predefined categories. The categories of named...
متن کاملHow textbooks (and learners) get it wrong: A corpus study of modal auxiliary verbs
Many elements contribute to the relative difficulty in acquiring specific aspects of English as a foreign language (Goldschneider & DeKeyser, 2001). Modal auxiliary verbs (e.g. could, might), are examples of a structure that is difficult for many learners. Not only are they particularly complex semantically, but especially in the Malaysian context ...
متن کاملHedges in English for Academic Purposes: A Corpus-based study of Iranian EFL learners
Hedges, as tools to express tentativeness and doubt, have been studied in plenty of research papers in the Iranian EFL research setting. However, their use in a learner corpus, portraying Iranian learner English, is in need of more research attention. With this end in view, this study aimed at investigating how Iranian EFL learners who have majored in English-related fields in Iran deployed hed...
متن کامل